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We present a systematic comparison of various iterative methods to obtain the fermion propagator with both 
overlap and twisted mass fermions at fixed pion mass in the quenched approximation. Taking the best available 
algorithm in each case we find that calculations with the overlap operator are by a factor of 20-40 more expensive 
than with the twisted mass operator at the parameter values considered here. For the overlap operator we also 
compare the efficiency of various methods for calculating the topological index. 



1. INTRODUCTION 

The Wilson twisted mass (Wtm) lattice 
fermion action of an SU/(2) flavour doublet of 
mass degenerate quarks with maximal twist an- 
gle has the form 

Am(^) = D w (m tm ) + «75T 3 • p 

where D w is the Wilson Dirac operator, mt m the 
bare quark mass tuned to its critical value and 
p the twisted quark mass. For our purpose it is 
sufficient to consider only one of the two flavours. 
The massive overlap operator is defined as 



D(m ov ) 



1 



2p 



D 



where 



D = p(l + l5 sign(Q(p)) 



is the massless overlap operator with Q(p) = 
75-Dw( — p), P = 1-6 and m ov the bare quark mass. 

For both Wtm and the overlap operator we 
present results using the inversion algorithms 
GMRES(m), CG(NE), CGS and BiCGstab [T] 
and additionally MR [T] and SUMR 03 for the 
overlap operator. 



2. SETUP 

Our set-up consists of two quenched ensembles 
of 20 lattices with V = 12 4 and 16 4 each gener- 
ated with the Wilson gauge action at (3 — 5.85 
corresponding to a lattice spacing of a ~ 0.12 fm. 

We invert both the twisted mass and the over- 
lap operator on two point-like sources r\ with two 
different bare quark masses and require a stop- 
ping criterion \ Ax — 77 1 2 < 10~ 14 . 

The quark masses m ov = 0.10 and m ov = 0.03 
for the overlap and /1 = 0.042 and /1 = 0.0125 for 
the twisted mass are chosen such that the corre- 
sponding pion mass for the twisted mass and the 
overlap operator are matched: 



niir = 720MeV 



390MeV 



m ov = 0.10 

H = 0.042 

m ov = 0.03 

(j, = 0.0125 



'Poster presented by K.J. 



We are working in a chiral basis and the 
two sources are chosen so that they corre- 
spond to sources in the two different chiral sec- 
tors. For the CG(NE) algorithm using the 
overlap operator we can then use the relation 
P ±J D(m ov ) J D(m ov )tp ± = 2pP±D(m 2 ov /(2p))P ± 
where P± are the chiral projectors, since the in- 
versions take place in a given chiral sector. This 
saves a factor of two with respect to the general 
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CG(NE) case and in the following we denote this 
algorithm by CG X . 

The computational bottleneck for the inver- 
sions of the overlap operator is the computation 
of the approximation of the sign- function sign(Q). 
Our approximations use Chebyshev polynomials 
of the order 0(200 — 300). In order to achieve this 
we project out the lowest 20 and 40 eigenvectors 
of the hermitian Wilson Dirac operator Q(p) on 
the 12 4 and 16 4 lattice, respectively. 

It is well known that by adapting the accuracy 
of the approximation during the course of the it- 
eration one can speed up the inversions by large 
factors since a reduction in the order of the poly- 
nomial enters multiplicatively in the total cost of 
the inversion. In the following we denote these 
algorithms by the subscript ap for adaptive pre- 
cision. The precision is adapted so as to ensure 
that no contributions to the sign-function approx- 
imation are calculated which are not needed at 
the present stage of the algorithm. In the case 
of the CG ap we simply calculate contributions up 
to the point where they are smaller by a factor 
O(10~ 2 ) than the desired residuum. This requires 
the full polynomial only at the beginning of the 
CG-search while towards the end of the search we 
use polynomials of the order O(10). In the case of 
MR ap , we start with a low order approximation of 
O(10) right from the beginning. Subsequently the 
introduced error is corrected from time to time by 
calculating the true residuum to full precision. 





Overlap 


Wtm 


rel. factor 


12 4 ,720Mev 
12 4 ,390Mev 


48.8(6) 
142(2) 


2.6(1) 
4.0(1) 


18.8 
35.4 


16 4 ,720Mev 
16 4 ,390Mev 


225(2) 
653(6) 


9.0(2) 
17.5(6) 


25.0 
37.3 



Table 1 

Best absolute timings in seconds. 



3. RESULTS 

We first present the relative timings of the algo- 
rithms for both the overlap and twisted mass op- 



erators in Figure ^ where in each case the timings 
are relative to the fastest algorithm. The timings 
were all obtained on one node of the Juelich Mul- 
tiprocessor (JUMP) IBM p690 Regatta using 32 
processors. 

Next we compare directly the absolute and rel- 
ative cost for the overlap and twisted mass op- 
erator in Tabled where we pick in each case the 
best available algorithm, GMRES ap for the over- 
lap and CGS for twisted mass. 

4. TOPOLOGICAL CHARGE COMPU- 
TATION 

For the computation of the topological index 
it is important to note that the determination of 
the chiral sector which contains the zero-modes 
comes for free when one uses the CG-algorithm 
for the inversion. From the CG-coefficients which 
are obtained during the iteration one can build 
up a tridiagonal matrix related to the underly- 
ing Lanczos procedure. The eigenvalues of this 
matrix approximate the extremal eigenvalues of 
the operator and it turns out that the lowest 5-10 
eigenvalues are approximated rather accurately. 
By estimating the eigenvalues once in each chiral 
sector and by pairing them accordingly it is pos- 
sible to identify the chiral sector which contains 
zero modes. 

In order to determine the topological charge 
itself one has to compute the lowest eigenval- 
ues of the overlap operator as well as their de- 
generacies. We have implemented two different 
algorithms, one based on the Ritz-Jacobi (RJ) 
method 1 115) and the other based on the Jacobi- 
Davidson (JD) method 6 . Both of them are im- 
proved by looking separately in the two chiral sec- 
tors using adaptive precision. 

We compare the two algorithms on a 12 3 x 24 
and a 16 4 lattice at j3 = 5.85 and p = 1.6 with five 
configurations each. Both methods first deter- 
mine the chiral sector containing zero modes and 
subsequently all the zero modes are calculated in 
this sector. With the JD method additionally two 
non zero modes have been computed. The tim- 
ings relative to RJ are 0.73(9) on the 12 3 x 24 
lattice and 0.93(7) on the 16 4 lattice. 

The performance of both methods is compa- 
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Figure 1. Relative timings of the various algorithms for the overlap (left column) and the twisted mass 
operator (right column). 



rable with slight advantages for the JD method. 
The speedup for the two methods in a MPI- 
parallel program is equal. 

5. CONCLUSIONS AND OUTLOOK 

We find that for Wtm fermions CGS appears to 
be the fastest inversion algorithm while for over- 
lap fermions it is GMRES ap for the parameters in- 
vestigated here. In a direct competition between 
twisted mass and overlap fermions the latter are 
by a factor of 20-40 more expensive if one com- 
pares the best available algorithms in each case. 

For the index computation a clever combina- 
tion of the methods described above looks most 
promising. Finally we note that one can apply 
various kinds of preconditioning to all the algo- 
rithms investigated here. For the twisted mass 
operator we expect even/odd or SSOR precondi- 
tioning to be efficient, while for the overlap op- 



erator low-mode preconditioning should be very 
effective for low quark masses. 
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